# Replication Package: Job-Training Conjoint

Replication data and code for the **job-training conjoint** analyses (Appendix C, Figures C1-C10) from:

> Brehm, Robin, Tianzi Zhou, and Steven Denney. 2025. "From Division to Democracy: Integrating Post-Socialist Citizens in Germany and South Korea." *Communist and Post-Communist Studies*. DOI: [10.1525/cpcs.2025.2636997](https://doi.org/10.1525/cpcs.2025.2636997)

Data are permanently archived on [Harvard Dataverse](https://doi.org/XXXX). This GitHub repository provides the living, editable version of the replication code.

## Contents

- `data/germany_job_training.csv` -- Germany conjoint + survey data
- `data/korea_job_training.csv` -- South Korea conjoint + survey data
- `docs/data_dictionary_germany.md` -- Variable definitions (Germany)
- `docs/data_dictionary_korea.md` -- Variable definitions (South Korea)
- `code/analysis.R` -- Full analysis script (derives variables, estimates models, produces figures)
- `code/prepare_data.R` -- Data provenance script (requires original Qualtrics exports; not needed for replication)

## Data structure

Each CSV contains **raw survey question responses** alongside conjoint task data. Each row is one candidate profile shown in a conjoint task. Variables include:

- **Respondent ID** and task identifiers
- **Conjoint attributes**: Age, Family, Gender, Occupation, Record, Origin (translated to English)
- **Demographics**: age, year of birth, gender, geographic location, education, ethnicity/political identification
- **Pre- and post-treatment survey items**: national identity strength, ethnocentrism items, policy attitudes, political orientation

All respondent-level variables are raw survey responses in the original survey language (German or Korean). The analysis script derives subgroup variables (e.g., median splits, regional classifications) from these raw inputs. See the data dictionaries for variable definitions, survey question mappings, and response value translations.

## Requirements

- R (tested with 4.5.1)
- Packages: `tidyverse`, `cregg`, `ggthemes`

## Run

Install dependencies and run:
```r
source("run_replication.R")
```

Or install manually and run the analysis directly:
```r
install.packages(c("tidyverse", "ggthemes", "remotes"))
remotes::install_github("leeper/cregg")
source("code/analysis.R")
```

Or from the command line:
```sh
make
```

Outputs are written to `output/`.
Figures are written to `output/figures/`.
`output/sessionInfo.txt` captures the package versions used.

## SI Figure Name Mapping

- `Figure_C1_DEU` / `Figure_C1_KOR`: Marginal means for Germany and South Korea
- `Figure_C2`: Germany marginal means including East (ethnic German only)
- `Figure_C3_DEU` / `Figure_C3_KOR`: Assistance attitudes subgroups
- `Figure_C4`: Germany regions
- `Figure_C5`: South Korea regions
- `Figure_C6`: Western German generations
- `Figure_C7`: Germany education
- `Figure_C8`: South Korea education
- `Figure_C9`: Germany political ID
- `Figure_C10`: South Korea political ID

## Sample sizes (post quality checks)

- Germany: **1,882** respondents (attention check + non-missing national identity strength)
- South Korea: **1,768** respondents (attention check)

## Notes on analysis samples

- The manuscript's main conjoint results focus on **Western Germans** and **ethnic Germans**. The analysis script applies these filters after deriving the relevant classification from raw survey responses.
- East/West and Berlin identifiers are derived from the `state_at_18` variable (Q5) to reproduce SI subgroup analyses.
- The conjoint `Age` attribute uses levels `25`, `35`, `46`, `62`, matching the manuscript design.
